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Types of Learning 


Roadmap 
@ When Can Machines Learn? 






Lecture 2: Learning to Answer Yes/No 


PLA A takes linear separable D and 
perceptrons H to get hypothesis g 











Lecture 3: Types of Learning 


Learning with Different Output Space V 
Learning with Different Data Label yn 
Learning with Different Protocol f = (Xn, Yn) 
Learning with Different Input Space %¥ 








@ Why Can Machines Learn? 
© How Can Machines Learn? 
@ How Can Machines Learn Better? 
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Types of Learning Learning with Different Output Space Y 


Credit Approval Problem Revisited 






































age 23 years 
gender female 
annual salary NTD 1,000,000 
year in residence 1 year 
unknown target function year in job 0.5 year 
Pet. => current debt 200,000 








(ideal credit approval formula) 


| 


credit? {no(—1), yes(+1)} 

















training examples bitchin final hypothesis 
D: (x11) > (XN, YN) a get 






















(historical records in bank) (‘learned’ formula to be used) 


hypothesis set 
H 


(set of candidate formula) 


YV = {—1,+1}: binary classification 


Types of Learning Learning with Different Output Space Y 


More Binary Classification Problems 
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e credit approve/disapprove 
e email spam/non-spam 
e patient sick/not sick 
e ad profitable/not profitable 
answer correct/incorrect (KDDCup 2010) 


core and important problem with 
many tools as building block of other tools 
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Types of Learning Learning with Different Output Space Y 


Multiclass Classification: Coin Recognition Problem 














e classify US coins (1c, 5c, 10c, 25c) 
A ee by (size, mass) 
cy oo 
Reis e Y = (1c 50 10c, 25c}, or 
alse, Ss Y = {1,2,--- , K} (abstractly) 
as ee S e binary classification: special case 
with K = 2 














Other Multiclass Classification Problems 
e written digits > 0,1,--- ,9 

e pictures = apple, orange, strawberry 

e emails => spam, primary, social, promotion, update (Google) 





many applications in practice, 
especially for ‘recognition’ 
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Types of Learning Learning with Different Output Space Y 


Regression: Patient Recovery Prediction Problem 


e binary classification: patient features = sick or not 
e multiclass classification: patient features = which type of cancer 
e regression: patient features = how many days before recovery 


e Y = R or Y = [lower, upper] c R (bounded regression) 
—deeply studied in statistics 







Other Regression Problems 
e company data = stock price 
e climate data => temperature 


also core and important with many ‘statistical’ 
tools as building block of other tools | 
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Types of Learning Learning with Different Output Space Y 


Structured Learning: Sequence Tagging Problem 


e multiclass classification: word = word class 
e structured learning: 
i love ML sentence = structure (class of each word) 
pe Ne ae e y = {PVN, PVP, NVN, PV,---}, not including 
pronoun verb noun VVVVV 
e huge multiclass classification problem 
(structure = hyperclass) without ‘explicit’ 
class definition 







Other Structured Learning Problems 
e protein data = protein folding 
e speech data = speech parse tree 





a fancy but complicated learning problem | 
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Types of Learning Learning with Different Output Space Y 
Mini Summary 
Learning with Different Output Space y 
e binary classification: Y = {—1, +1} 
multiclass classification: Y = {1,2,--- ,K} 
regression: Y = R 
structured learning: Y = structures 
... and a lot more!! 












unknown target function 
f: X> YV 




















training examples kaid final hypothesis 
D: (X1, Y1), +- 5 (XN, Yn) a gaf 

















hypothesis set 
H 


core tools: binary classification and regression | 


Types of Learning Learning with Different Output Space Y 


Fun Time 





What is this learning problem? 


The entrance system of the school gym, which does automatic face 
recognition based on machine learning, is built to charge four different 
groups of users differently: Staff, Student, Professor, Other. What type 
of learning problem best fits the need of the system? 


@ binary classification 

© multiclass classification 
© regression 

© structured learning 











Types of Learning Learning with Different Output Space Y 


Fun Time 


What is this learning problem? 


The entrance system of the school gym, which does automatic face 
recognition based on machine learning, is built to charge four different 


groups of users differently: Staff, Student, Professor, Other. What type 


of learning problem best fits the need of the system? 
@ binary classification 
@® multiclass classification 
© regression 
© structured learning 








Reference Answer: @ 


There is an ‘explicit’ V that contains four 
classes. 
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Types of Learning Learning with Different Data Label yn 


Supervised: Coin Recognition Revisited 








unknown target function 
f: X >Y 







































learning 
algorithm 
A 


training examples 
D: (%4,1), (XN, Yn) 


final hypothesis 
gaf 




















hypothesis set 
H 


supervised learning: 
every x, comes with corresponding yn 


Types of Learning Learning with Different Data Label yn 


Unsupervised: Coin Recognition without yn 
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supervised multiclass classification unsupervised multiclass classification 


<> ‘clustering’ 


Other Clustering Problems 
e articles = topics 
e consumer profiles = Consumer groups 


clustering: a challenging but useful problem 






Types of Learning Learning with Different Data Label yn 


Unsupervised: Coin Recognition without yn 
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<> ‘clustering’ 


Other Clustering Problems 
e articles = topics 
e consumer profiles = consumer groups 


clustering: a challenging but useful problem 






Types of Learning Learning with Different Data Label yn 


Unsupervised: Learning without yn 


Other Unsupervised Learning Problems 


clustering: {Xn} = cluster(x) 

(~ ‘unsupervised multiclass classification’) 

—i.e. articles = topics 

density estimation: {x,} = density(x) 

(= ‘unsupervised bounded regression’) 

—1i.e. traffic reports with location = dangerous areas 


outlier detection: {Xn} = unusual(x) 
(~ extreme ‘unsupervised binary classification’) 
—i.e. Internet logs = intrusion alert 


E ... and a lot more!! | 








unsupervised learning: diverse, with possibly 
very different performance goals | 
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Types of Learning Learning with Different Data Label yn 


Semi-supervised: Coin Recognition with Some yn 
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unsupervised (clustering) 






Other Semi-supervised Learning Problems 


e face images with a few labeled = face identifier (Facebook) 
e medicine data with a few labeled = medicine effect predictor 


semi-supervised learning: leverage 
unlabeled data to avoid ‘expensive’ labeling 
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Types of Learning Learning with Different Data Label yn 


Reinforcement Learning 
a ‘very different’ but natural way of learning 





Teach Your Dog: Say ‘Sit Down’ 


The dog pees on the ground. 
BAD DOG. THAT’S A VERY WRONG ACTION. 





e cannot easily show the dog that yn = sit 
when Xp = ‘sit down’ 


e but can ‘punish’ to say Yn = pee is wrong 











Other Reinforcement Learning Problems Using (x, ý, goodness) 


e (customer, ad choice, ad click earning) = ad system 
e (cards, strategy, winning amount) = black jack agent 





reinforcement: learn with ‘partial/implicit 
information’ (often sequentially) | 
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Types of Learning Learning with Different Data Label yn 


Reinforcement Learning 
a ‘very different’ but natural way of learning 








Teach Your Dog: Say ‘Sit Down’ 


The dog sits down. 
Good Dog. Let me give you some cookies. 


e still cannot show y, = sit 
when Xp = ‘sit down’ 


e but can ‘reward’ to say Ÿn = sit is good 











Other Reinforcement Learning Problems Using (x, ý, goodness) 


e (customer, ad choice, ad click earning) = ad system 
e (cards, strategy, winning amount) = black jack agent 





reinforcement: learn with ‘partial/implicit 
information’ (often sequentially) | 
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Types of Learning Learning with Different Data Label yn 
Mini Summary 

Learning with Different Data Label yn 
e supervised: all y, 

unsupervised: no yn 

semi-supervised: some yn 

reinforcement: implicit yn by goodness(y,) 

... and more!! 









unknown target function 
f: X >Y 

















training examples baad final hypothesis 
D: (X1, y1), s (XN, YN) e get 

















hypothesis set 
H 


core tool: supervised learning ) 


Types of Learning Learning with Different Data Label yn 
Fun Time 
What is this learning problem? 


To build a tree recognition system, a company decides to gather one 
million of pictures on the Internet. Then, it asks each of the 10 
company members to view 100 pictures and record whether each 
picture contains a tree. The pictures and records are then fed to a 
learning algorithm to build the system. What type of learning problem 
does the algorithm need to solve? 

@ supervised 

© unsupervised 

© semi-supervised 

© reinforcement 
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Types of Learning Learning with Different Data Label yn 
Fun Time 
What is this learning problem? 





To build a tree recognition system, a company decides to gather one 
million of pictures on the Internet. Then, it asks each of the 10 
company members to view 100 pictures and record whether each 
picture contains a tree. The pictures and records are then fed to a 
learning algorithm to build the system. What type of learning problem 
does the algorithm need to solve? 

© supervised 

© unsupervised 

© semi-supervised 


© reinforcement 











Reference Answer: © 


The 1,000 records are the labeled (Xn, yn); the 
other 999, 000 pictures are the unlabeled Xp. 
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Types of Learning Learning with Different Protocol f => (Xn, Yn) 


Batch Learning: Coin Recognition Revisited 








unknown target function 
f: X >Y 



































training examples raai final hypothesis 
D: (X1, Y1), (XN; YN) a gaf 




















hypothesis set 
H 
batch supervised multiclass classification: 
learn from all known data 


Types of Learning Learning with Different Protocol f => (Xn, Yn) 


More Batch Learning Problems 
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e batch of (email, soam?) = spam filter 


e batch of (patient, cancer) = cancer 
classifier 


e batch of patient data = group of patients 


batch learning: a very common protocol 







Types of Learning Learning with Different Protocol f => (Xn, Yn) 


Online: Spam Filter that ‘Improves’ 


e batch spam filter: 
learn with known (email, spam?) pairs, and predict with fixed g 
e online spam filter, which sequentially: 
@ observe an email x; 
@ predict spam status with current g;(x:) 
© receive ‘desired label’ y; from user, and then update g; with (Xt, yt) 










Connection to What We Have Learned 
e PLA can be easily adapted to online protocol (how?) 
e reinforcement learning is often done online (why?) 


online: hypothesis ‘improves’ through receiving 
data instances sequentially | 
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Types of Learning Learning with Different Protocol f => (Xn, Yn) 
Active Learning: Learning by ‘Asking’ 
Protocol = Learning Philosophy 
e batch: ‘duck feeding’ 
> e online: ‘passive sequential’ 


e active: ‘question asking’ (sequentially) 
—query the yn of the chosen Xp 












unknown target functio 
f: X> Y 


| 


























training examples apt final hypothesis 
D: (X1, Y1), , (XN, YN) = m gaf 
























hypothesis set 
active: improve hypothesis with fewer labels 
(hopefully) by asking questions strategically 
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Types of Learning Learning with Different Protocol f => (Xn, Yn) 


Mini Summary 


Learning with Different Protocol f = (Xn, Yn) 
e batch: all known data 

e online: sequential (passive) data 

D e active: strategically-observed data 

e ... and more!! 










unknown target functio 
f: X >Y 


















training examples patie final hypothesis 
D: (X1, Y1), (XN, YN) P get 




















hypothesis set 
H 


core protocol: batch ) 


Types of Learning Learning with Different Protocol f => (Xn, Yn) 
Fun Time 

What is this learning problem? 
A photographer has 100,000 pictures, each containing one baseball 
player. He wants to automatically categorize the pictures by its player 
inside. He starts by categorizing 1,000 pictures by himself, and then 
writes an algorithm that tries to categorize the other pictures if it is 
‘confident’ on the category while pausing for (& learning from) human 
input if not. What protocol best describes the nature of the algorithm? 

@ batch 

@ online 

© active 

© random 
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Types of Learning Learning with Different Protocol f => (Xn, Yn) 


Fun Time 

What is this learning problem? 
A photographer has 100,000 pictures, each containing one baseball 
player. He wants to automatically categorize the pictures by its player 
inside. He starts by categorizing 1,000 pictures by himself, and then 
writes an algorithm that tries to categorize the other pictures if it is 
‘confident’ on the category while pausing for (& learning from) human 
input if not. What protocol best describes the nature of the algorithm? 

@ batch 

@ online 

© active 

© random 








Reference Answer: © 


The algorithm takes a active but naïve strategy: 
ask when ‘confused’. You should probably 
do the same when taking a class. :-) 
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Types of Learning Learning with Different Input Space X 


Credit Approval Problem Revisited 



































age 23 years 
gender female 
unknown target function annual salary REEDED One 
<q a 2 y D) year in residence 1 year 
Z year in job 0.5 year 
(ideal credit approval formula) current debt 200,000 








| 


training examples 
D: (X1, y1), mee , (Xn, Yn) 


(historical records in bank) 
















learning 
algorithm 
A 


final hypothesis 
gf 

























(‘learned’ formula to be used) 


hypothesis set 
H 


(set of candidate formula) 


concrete features: each dimension of X C R 
represents ‘sophisticated physical meaning’ 


Types of Learning Learning with Different Input Space X 


More on Concrete Features 





e (size, mass) for coin classification 

e customer info for credit approval f Fai 
e patient info for cancer diagnosis Ta 
e often including ‘human intelligence’ 7 fo, A 


on the learning task 

















Size 





concrete features: the ‘easy’ ones for ML | 


Types of Learning Learning with Different Input Space X 


Raw Features: Digit Recognition Problem 


7473 
7 8 
7 A 
7 0 F 


e digit recognition problem: features = meaning of digit 
e atypical supervised multiclass classification problem 


—— 
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Types of Learning Learning with Different Input Space X 


Raw Features: Digit Recognition Problem (2/2) 


by Concrete Features 


by Raw Features 
e 16 by 16 gray image x = 
(0,0, 0.9,0.6,---) € R356 

e ‘simple physical meaning’; 
thus more difficult for ML 

than concrete features 



































x =(symmetry, density) 


Other Problems with Raw Features 
e image pixels, speech signal, etc. 
raw features: often need human or machines 
to convert to concrete ones | 
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Types of Learning Learning with Different Input Space X 


Abstract Features: Rating Prediction Problem 


Rating Prediction Problem (KDDCup 2011) 


e given previous (userid, itemid, rating) tuples, predict the rating that 
some userid would give to itemid? 


e a regression problem with Y C R as rating and ¥ C N x Nas 
(userid, itemid) 


e ‘no physical meaning’; thus even more difficult for ML 














Other Problems with Abstract Features 
e student ID in online tutoring system (KDDCup 2010) 
e advertisement ID in online ad system 





abstract: again need ‘feature 
conversion/extraction/construction’ 
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Types of Learning Learning with Different Input Space X 


Mini Summary 










Learning with Different Input Space %¥ 
e concrete: sophisticated (and related) 
physical meaning 
EE T e raw: simple physical meaning 
fiX Y e abstract: no (or little) physical meaning 
e ... and more!! 




















training examples plied final hypothesis 
D: (X1, Y1), (XN; YN) "i gaf 

















hypothesis set 
H 


‘easy’ input: concrete 


Types of Learning Learning with Different Input Space X 


Fun Time 









What features can be used? 


Consider a problem of building an online image advertisement system 
that shows the users the most relevant images. What features can you 
choose to use? 


@ concrete 

@ concrete, raw 

© concrete, abstract 

© concrete, raw, abstract 
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Fun Time 


What features can be used? 


Consider a problem of building an online image advertisement system 
that shows the users the most relevant images. What features can you 
choose to use? 


© concrete 

@ concrete, raw 

© concrete, abstract 

© concrete, raw, abstract 












Reference Answer: © 


concrete user features, raw image features, 
and maybe abstract user/image IDs 
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Types of Learning Learning with Different Input Space X 


Summary 
@ When Can Machines Learn? 





Lecture 2: Learning to Answer Yes/No 
Lecture 3: Types of Learning 


ə Learning with Different Output Space Y 
[classification], [regression], structured 
ə Learning with Different Data Label yn 
[Supervised], un/semi-supervised, reinforcement 
ə Learning with Different Protocol f = (Xn, Yn) 
[batch], online, active 
ə Learning with Different Input Space ¥ 
[concrete], raw, abstract 












e next: learning is impossible?! 
@ Why Can Machines Learn? 
© How Can Machines Learn? 
@ How Can Machines Learn Better? 
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